-
Notifications
You must be signed in to change notification settings - Fork 2k
[ENH] For chroma cloud efs, extract api key from header if available to authenticate #5914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
994803a to
a92603d
Compare
|
Add shared client API key discovery for Chroma Cloud embeddings Introduces a static helper on Key Changes• Added Affected Areas• This summary was automatically generated by @propel-code-bot |
| except Exception: | ||
| # if we can't access the ServerAPI instance or it doesn't have _session, | ||
| # continue to the next system instance | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Reliability] The broad except Exception silently catches all errors, including critical issues like AttributeError from API changes or KeyError from malformed data structures. This makes debugging difficult when the code fails.
except AttributeError as e:
# ServerAPI doesn't have expected attributes (_session or _api_url)
logger.debug(f"Skipping system {system_id}: {e}")
continue
except Exception as e:
# Unexpected errors should be logged for investigation
logger.warning(f"Unexpected error extracting API key from system: {e}")
continueThis distinguishes expected structural variations from genuine errors that need attention.
Context for Agents
The broad `except Exception` silently catches all errors, including critical issues like `AttributeError` from API changes or `KeyError` from malformed data structures. This makes debugging difficult when the code fails.
```python
except AttributeError as e:
# ServerAPI doesn't have expected attributes (_session or _api_url)
logger.debug(f"Skipping system {system_id}: {e}")
continue
except Exception as e:
# Unexpected errors should be logged for investigation
logger.warning(f"Unexpected error extracting API key from system: {e}")
continue
```
This distinguishes expected structural variations from genuine errors that need attention.
File: chromadb/api/shared_system_client.py
Line: 133| # If not found in env var, try to get it from existing client instances | ||
| if not self.api_key: | ||
| raise ValueError(f"The {api_key_env_var} environment variable is not set.") | ||
| from chromadb.api.shared_system_client import SharedSystemClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not do inline imports. Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because of circular dependencies. The path is essentially chromadb.api -> schema -> embedding function -> qwen ef -> shared system client -> chromadb.api
we would need to either refactor where the embedding function protocol is defined which is a breaking change, or move the ServerAPI.
An alternative would be to refactor the embedding function protocol to allow build_from_config to take a client, but that's a much larger refactor, and would also be a breaking change for custom efs
I'm not able to think of another way to get the client to auto propagate keys to the ef. we could also have an optional parameter to the ef that takes in a client, but it removes the ease of use, and its weird to pass in a client to the ef.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it to lazy import similar to how schema and types.py import
fb83644 to
51d83a3
Compare
51d83a3 to
fb90210
Compare
chromadb/utils/embedding_functions/chroma_cloud_splade_embedding_function.py
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
fb90210 to
db92fb8
Compare
chromadb/utils/embedding_functions/chroma_cloud_qwen_embedding_function.py
Outdated
Show resolved
Hide resolved
db92fb8 to
fb6b061
Compare
fb6b061 to
541c25a
Compare
| if len(api_keys) > 1: | ||
| logger.info( | ||
| f"Multiple Chroma Cloud clients found, using API key starting with {api_keys[0][:8]}..." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Security] Logging a prefix of the API key poses a security risk. Sensitive credentials, even partial ones, should not be written to logs as they could be exposed. A more secure approach is to log that multiple keys were found without revealing any part of the key itself.
Context for Agents
Logging a prefix of the API key poses a security risk. Sensitive credentials, even partial ones, should not be written to logs as they could be exposed. A more secure approach is to log that multiple keys were found without revealing any part of the key itself.
File: chromadb/api/shared_system_client.py
Line: 1753f4fee4 to
3cfe00d
Compare
| from enum import Enum | ||
|
|
||
|
|
||
| def _get_shared_system_client(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is every ef defining this when it could be shared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought it would be clearer since its for import purposes and it was a 2 liner. updated so splade ef reads from qwen ef's impl. thought about moving to its own file, but didnt think this was worth introducing an ef helper file
3cfe00d to
0f0fd2b
Compare
| from chromadb.base_types import SparseVector | ||
| import os | ||
| from typing import Union | ||
| from chromadb.utils.embedding_functions.chroma_cloud_qwen_embedding_function import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense for this to import from another embedding function. I think this should just be in a seperate file.
0f0fd2b to
62022f8
Compare

Description of changes
Summarize the changes made by this PR.
api.trychroma.comTest plan
How are these changes tested?
Added tests for extraction function
pytestfor python,yarn testfor js,cargo testfor rustMigration plan
Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?
Observability plan
What is the plan to instrument and monitor this change?
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?